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Abstract 

In  this  paper,  it  is  shown  that  two  popular  conceptions  about  the  behavior  of  negative  power 
law  (neg-p)  noise — that  is,  noise  with  a  PSD  Lp(f)  x\f\p  for  p<0 — are  based  on  myth  and  that 
the  reality  is  quite  different.  The  first  myth  is  that  one  can  “fix”  a  neg-p  divergence  problem  in 
a  variance  like  a  standard  or  N-sample  variance  simply  by  replacing  it  with  an  Allan  or 
Hadamard  variance  without  further  action.  The  paper  will  show  that  each  type  of  variance  has 
a  different  interpretation  as  an  error  measure  and  that  such  arbitrary  swapping  merely  masks 
the  true  problem.  In  the  process,  we  will  show  that  such  variance  divergences  are  true 
indicators  of  severe  system  or  modeling  problems  that  must  be  physically  addressed,  not 
ignored.  The  second  myth  is  that  one  can  use  ensemble-based  statistical  estimation  techniques 
like  least  squares  and  Kalman  filters  to  properly  estimate  polynomial  deterministic  behavior  in 
data  containing  non-highpass  filtered  neg-p  noise.  It  is  demonstrated  that  such  noise  can 
generate  highly  anomalous  fitting  results  because  non-highpass-filtered  neg-p  noise  is  both 
infinitely  correlated  and  non-ergodic.  Thus,  non-p  noise  is  shown  to  act  more  like  systematic 
error  than  conventional  noise  in  such  cases. 


I.  INTRODUCTION 

This  paper  will  show  that  two  popular  conceptions  in  dealing  with  negative  power  law  noise  (neg-p)  noise 
arc  based  on  myth  and  that  the  reality  is  quite  different.  By  neg-p  noise,  we  mean  noise  with  a  single 
sideband  (SSB)  power  spectral  density  (PSD)  Lp(f)  cc  |f|p  for  p<0  [1,2].  This  paper  is  not  questioning  the 
reality  that  higher  order  A-variances  [3],  like  Allan  [1]  and  Hadamard  variances  [4]  are  convergent 
measures  of  neg-p  noise  [1,4].  What  the  paper  will  show  is  that  it  is  myth  that  one  can  “fix”  neg-p 
divergence  problems  in  common  variances  like  standard  and  sample  variances  [5]  simply  by  replacing 
them  with  A-variances  without  further  action.  We  will  show  that  each  type  of  variance  is  a  statistical 
answer  to  a  different  type  of  error  question  and  that  arbitrarily  changing  variances  is  misleading  in  that  it 
doesn’t  fix  the  divergent  answer  to  the  original  question.  Furthermore,  we  will  show  that  such  variance 
divergences  have  physical  significance — that  they  are  valid  indicators  of  real  problems  that  must  be  fixed 
by  changing  the  system  or  the  question  being  asked,  not  mathematical  artifacts  to  be  ignored. 

A  second  myth  we  will  address  is  that  one  can  use  ensemble -based  statistical  estimation  theory,  such  as 
least  squares  [5]  and  Kalman  [7]  filters,  on  data  containing  neg-p  noise  to  properly  estimate  true  or 
deterministic  polynomial  behavior  also  contained  in  the  data,  unless  the  neg-p  noise  is  sufficiently 
highpass-filtered  [8-11].  We  will  demonstrate  that  fitting  results  in  such  cases  cannot  separate  the  true 
behavior  from  much  of  the  noise,  because  non-highpass-filtered  neg-p  noise  is  both  infinitely  correlated 
and  non-ergodic  (ensemble  averages  are  not  equal  to  time  averages). 
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II.  MYTH  1:  ONE  CAN  ARBITRARILY  SWAP  VARIANCES  TO  “FIX” 
NEG-P  DIVERGENCE  PROBLEMS 

In  this  section,  we  will  show  that  each  type  of  variance  is  a  statistical  answer  to  a  different  type  of  error 
question.  Thus,  arbitrarily  swapping  variances  misleadingly  changes  the  question  and  does  not  eliminate  a 
divergent  answer  to  the  original  question. 

A.  Statistical  Estimation 

Statistical  error  measures  like  variances  are  generally  defined  in  the  context  of  statistical  estimation.  Fig.  1 
and  Table  I  describe  the  truth  model  and  variables  we  will  use  in  discussing  statistical  estimation.  This 
model  applies  to  least-squares  fitting  (LSQF)  [5]  and  Kalman  filters  [6]  in  a  posteriori  form  [12],  as  well 
as  other  similar  statistical  estimation  techniques.  We  will  briefly  summarize  this  model  here,  and  the 
reader  is  referred  to  [7-11]  for  more  detail.  In  this  model,  x  (t)  is  general  data  variable  (not  necessarily  the 
time  error)  whose  samples  x  (tn)  arc  collected  over  an  interval  T.  t  and  tn  here  arc  ideal  continuous  and 
discrete  observation  times  and  arc  considered  error-free,  x  (t)  in  our  model  is  the  sum  of  xc(t),  the  true  or 
deterministic  behavior,  and  xr(t),  the  contaminating  error  or  measurement  noise.  In  the  model,  an 
unspecified  estimation  technique  generates  a  “best”  estimate  of  xc(t)  by  adjusting  M-parameters  am  in  a 
model  function  xa  M(t)  based  on  some  fit  over  x(tn).  Note  that  we  will  use  xa  M(t)  both  to  describe  the  model 
function  with  adjustable  parameters  and  the  final  fit,  depending  on  the  context.  We  also  note  that  xc(t), 
x  (t),  and  xa  M(t)  can  be  functions  of  other  time  dependent  variables,  such  as  temperature  and  pressure  [13]. 
For  simplicity,  these  other  variables  are  not  shown. 


x(t) 


Mth  Order  A-Measures  A(x)Mx(tn) 


* - A2(x)x(tn)  - *i 

A(x)x(tn)  - h - A(x)x(tn+x) 


* 


Fig.  1 .  Truth  model  and  variables  for  statistical  estimation. 
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Table  I.  Statistical  estimation  model  &  error  measures. 


Basic  variables 

Measured  data  :  x(tn)  =  xc(tn)  +  xr(tn)  Sample  times  :  tn  (over  data  period  T) 

True  or  deterministic  behavior:  xc(t)  True  noise:  xr(t) 

M  parameter  model  function  and  final  estimate  of  xc(t):  xa  M(t) 

(M-l)th  order  polynomial  model  function:  xpoly  M(t)  =  Zm  am(t  -  t0)m  [  m  =  0:M-1] 

Basic  Error  Measures 
True  accuracy  of  fit:  xw,M(tn)  =  xa,M(tn)  -  xc(tn) 

Data  precision:  xJiM(tn)  =  x(tn)  -  xa,M(tn) 

Mth  order  A-measures  (stability  &  precision  under  certain  conditions):  A(x)Mx(tn) 

1st  forward  difference:  A(t)  =  x(tn+x)  -  x(tn) 

Variances 

Point  variance(Kalman) :  A^(tn)  =  <E[x(-(tn)  — <E{x^  (tn)}]2  [q  =  w,M  or  j,M] 
AveragevarianceoverT (LSQF) :  ^  ^nA^(tn)  [n  =  1 :  N] 

M  / 

A'x.Mftn)  =  ^M‘E[A(T)Mx(tn)]2  ^  =  Tj  .... - 77 

m!(M  -  m)! 

m=()  v 

E  =  Ensemble  average  c„  =  Data  weighting  over  T 

Derived  Error  Measures 

Fit  precision  (point  variance)  A2wjM(t  J  =  pd(tn)A2M(tn) 

Fit  Precision  (averagevariance)  ct2:M  =pdcr2M 

2  2 

pd  and  Pd(tn)  are  theoretically  calculated  from  pd(tn)  =  A"^,  M(tn)/Aj  M(tn)  and 
2  2 

Pd  =  aw  M^°jM  based  on  a  specific  error  model  and  assuming  no  model  error. 


B.  Basic  Error  Measures 

In  Fig.  1  and  Table  I,  we  define  the  true  accuracy  of  the  fit  at  tn  as  xw  M(tn);  that  is,  the  accuracy  is  the 
difference  between  xaM(tn)  and  xc(tn).  xw  M(tn)  is,  of  course,  unobservable  from  the  data  alone,  since  a 
priori  knowledge  of  xc(t)  is  required  to  generate  it.  The  basic  observable  error  measure  at  tn  is  the  data 
precision  xj  M(tn),  defined  as  the  difference  between  x(tn)  and  xaM(tn),  also  given  in  Table  I.  From  xw  M(tn) 
and  Xj,M(tn),  one  can  form  two  types  of  theoretical  variances  (see  Table  I): 

(a)  Aw  M(tn)  and  A j  M (tn )  we  will  call  point  variances.  These  are  generally  used  in  a  Kalman  filter  [6]. 

(b)  m  and  a2 M  we  will  call  average  variances  over  T  weighted  by  These  are  generally  used  in  a 
LSQF  [5]. 

Note  that  a2. M  is  also  called  the  standard  variance  and  o-j  is  called  the  sample  variance  when  cn=  l/(N- 1 ) 

and  the  fit  solution  is  the  sample  mean  xa,i(t)  =  N"'Xnx(tn)  [5].  Note  also  that  the  above  are  “theoretical”  or 
ensemble  variances  formed  by  averaging  over  an  ensemble  of  data  sets  [14],  that  is,  by  using  an  ensemble 
averaging  operator  £  as  opposed  to  an  infinite  time  average  operator  <...>  over  a  single  ensemble  member 
[14].  Finally,  note  that  cqv  M  and  a2 M  may  have  implicit  t-dependence,  because  both  xaM(tn)  and  xc(t)  are 
not  generally  time -invariant  [9,10]. 
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Another  set  of  measures  used  to  describe  random  error  we  will  call  Mth  order  A-measures  A(x)Mx(tn)  [3], 

'y  r\ 

These  and  their  theoretical  variances  A“M(tn)  and  <j~  M  arc  defined  in  Table  I  [10,11]-  These  arc 
measures  of  x  (t)  variations  over  the  interval  x.  We  note  that  A"  M(tn)  and  a~2  arc  related  to  the  Allan 
variance  of  the  time  error  [1],  and  a;  3  is  related  to  the  Hadamard  vaiiance  of  the  time  error  [4,15]. 

A-measures  arc  generally  interpreted  as  measures  of  Mth  order  stability  [1,4,15].  To  understand  this 
interpretation,  let  us  precisely  define  what  we  mean  by  Mth  order  stability.  Consider  Fig.  2(a).  Here,  we 
show  M+l  data  points  x  (tm  )  where  we  have  passed  a  model  function  xaM(t)  exactly  through  M  of  the  M+l 
points,  excluding  the  point  at  tm.  This  is  possible  because  there  are  M  points  and  M  adjustable  parameters 
in  xaM(t),  so  there  arc  zero  degrees  of  freedom  [5].  We  then  define  the  Mth-order  stability  as  the  data 
precision  Xj,M(tm)  at  the  excluded  point.  Note  from  the  figure  that  XjjM(tm)  can  be  either  an  extrapolation  or 
interpolation  error,  depending  on  tm.  What  is  important  about  this  is  one  can  show  that 

Xj.M^m^AW^to)  (1) 

when:  (a)  xaM(t)  is  xpolyM(t)  an  (M-l)th-order  polynomial,  and  (b)  the  tm>  are  separated  by  the  time  interval  x 
[3],  Thus,  A-variances  arc  measures  of  such  Mth-order  stability. 


x(t0) 


. 

-  -  -f  -  1 

^  i 


-H  T  *■  / 

'*■  xa  M(t,a)  -+ 
fit  over  t0...tM_., 


x(tM)  °x(tm) 

-  y  ■*'/ 

\  xkM(t,a)  fit  -V 
over  t0...tM  but  no  tn 


Extrapolation  Interpolation 

* - (a)  Stability - ► 


x(tm) 

Xj,wi(fm) . • 


•  / 


■>!  T  *■  . 

xaW1(t,a)  -V 

fit  over  all  t0...tM 

(b)  Precision 


Fig.  2.  A-measures  as  stability  and  precision  measures. 


Fig.  2(b)  shows  the  data  precision  when  all  M+l  x(tm  )  are  used  to  determine  xpoiy,M(t)  with  an  unweighted 
LSQF.  In  this  case,  one  can  also  show  that  (1)  is  true  [7,8,16].  Thus,  Mth-order  A-variances  can  also  be 
considered  data  precision  measures  under  these  conditions.  The  proportionality  constants  relating 
Xj.MCm)!0  A(x)Mx(t0)  for  both  the  stability  and  precision  have  been  published  [7,8].  In  the  precision 

case,  the  published  constant  is  derived  semi-empirically  [7],  but  we  note  that  Charles  A.  Greenhall  has 
provided  the  author  with  a  totally  analytical  derivation  of  this  constant  [16].  Thus,  the  Allan  vaiiance  can 
also  be  interpreted  as  a  measure  of  data  precision  for  M+ 1  x(tm)  spaced  by  x  when  a  time  and  frequency 
offset  are  removed  from  the  data  by  an  unweighted  LSQF  [6,7].  Similarly,  the  Hadamard  vaiiance  can 
also  be  interpreted  as  a  measure  of  such  data  precision  when  a  time  and  frequency  offset  and  the  frequency 
drift  arc  removed  from  the  data  by  an  unweighted  LSQF  [6,7].  This  explains  the  sensitivity  of  the  Allan 
vaiiance  and  insensitivity  of  the  Hadamard  vaiiance  to  deterministic  frequency  drift,  since  such  drift  is  an 
unmodeled  error  term  for  M  =  2,  but  not  for  M  =  3  [7,8]. 
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C.  The  Fit  Precision  (Error  Bars)-A  Derived  Error  Measure 

From  the  data  precision  variances,  one  can  generate  what  we  will  call  the  fit  precision  (deviate)  or  error 
bars  Awj  M(tn)  and  a  w-  M ,  as  given  in  Table  I  [9-11].  These  are  statistical  estimates  of  the  accuracy  based 

both  on  the  observable  data  precision  and  the  ratios  Pd(tn)  and  pd,  which  are  theoretically  calculated  using  a 
a  specific  noise  model  (and  assuming  that  xa  M(t)  would  precisely  reproduce  xc(t)  over  T  if  no  noise  were 
present).  For  example,  pdo=M/(N-M)  is  the  pd  for  uncorrelated  or  white  xr(tn)  and  an  unweighted  LSQF 
[5].  We  note  that  this  pdo  does  not  apply  when  the  xr(tn)  are  correlated  [7-11].  We  will  later  show  that  this 
misuse  of  the  white  pdo  is  one  of  the  sources  of  unexpected  fitting  results  when  neg-p  noise  is  present. 

D.  The  Neg-p  Convergence  Properties  of  Variances 


It  is  well  known  that  an  average  variance  aq  can  be  represented  using  the  spectral  integral  [14,17] 

a2  =  f  +°°dfK  (f)  IHs(f)  I2  L  (f)  K  =  w,M;j,M;T,M;  ...]  (2) 

J  —00 

Here,  Hs(f)  is  a  response  function  that  describes  the  noise  filtering  properties  of  the  system,  and  K^(f)  is  a 
spectral  kernel  that  describes  the  Lp(f)  filtering  properties  of  the  variance  in  question  independent  of  Hs(f) 
[3,17].  It  is  well  known  that  the  A- variance  kernel  KxM(f)  has  f  2M  highpass  (HP)  filtering  properties  for 
Ifl «  1  [1,3,4].  Less  well  known  is  the  fact  that  the  Mth-order  data  precision  kernel  K|  V1(f)  has  the  same 
f  2M  highpass  (HP)  filtering  properties  for  Ifl  «  1  when  xaM(t)  is  an  (M-l)th-order  polynomial  xpoly  M(t) 
[7,8].  This  result  is  true  for  general  fitting  techniques  given  only  minimal  restrictions  [7,8].  Fig.  3  shows 
this  KjM(f)  HP  behavior  when  both  a  weighted  and  unweighted  LSQF  are  used  as  the  fitting  technique 
[7,8].  Thus,  both  cT j  |yj  and  crx  y |  are  guaranteed  to  converge  for  neg-p  noise  when  2M  ■':>  Ipl  [3, 5, 6-8]. 

On  the  other  hand,  the  accuracy  kernel  Kj,M(f)  has  no  HP  filtering  properties.  (In  fact,  one  can  show  that 
Kj,M(f)  +  Kw,M(f) =  1  for  a  LSQF  [11].)  Thus,  a2.  M  relies  totally  on  the  HP  filtering  properties  of  IHs(f)l2 

for  its  convergence  in  the  presence  of  neg-p  noise  [7-11].  Because  of  this,  there  is  the  obvious  temptation 
to  “fix”  a  neg-p  accuracy  variance  divergence  simply  by  replacing  it  with  one  of  the  convergent  variances 
without  further  action.  In  the  next  section,  we  will  show  that  this  is  improper. 
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Unweighted  LSQF 


Weighted  LSQF 


Fig.  3.  Kj?M(f)  HP  filtering  behavior  for  xpoly  M(t). 


E.  Each  Variance  Addresses  a  Different  Error  Question 

We  have  just  shown  that  each  type  of  variance  addresses  a  different  statistical  error  question: 

(a)  Accuracy:  What  is  the  error  of  the  fit  from  the  true  behavior  without  noise  or  other  error  present? 

(b)  Data  Precision:  What  are  the  data  fluctuations  from  the  fitted  behavior? 

(c)  Fit  Precision:  What  is  the  estimated  fit  accuracy  based  on  the  measured  data  and  a  noise  model? 

(d)  Stability:  What  is  the  extrapolation  or  interpolation  error  to  an  additional  point  from  a  perfect  M 
point  polynomial  fit? 

From  this,  it  is  obvious  that  one  cannot  eliminate  a  divergence  problem  in  one  type  of  variance  simply  by 
arbitrarily  replacing  it  with  another  type,  since  all  this  does  is  misleadingly  change  the  question  and  leaves 
the  divergence  intact  for  the  original  question. 


x(t)  =  f 

Start 
Noise 
Here  ! 


Ensemble  Members 

V. 

id 


iT2nM 


-M- 


2<Vi  00 
as  t0  —>  oo 

~T 


Fig.  4.  A  physical  interpretation  of  a  neg-p  accuracy  divergence. 


This  leaves  the  problem  of  how  to  interpret  the  physical  meaning  of  such  a  neg-p  accuracy  divergence. 
Fig.  4  shows  one  such  interpretation.  Here,  we  show  several  data  ensemble  members  where  the  data 

—9  —9 

consist  entirely  of  f  or  random  walk  noise  [2].  Note  that  the  f  noise  process  is  started  at  finite  time  (t 
=  0),  which  is  called  the  non-stationary  (NS)  picture  (to  be  discussed  later)  [2,11,14].  At  a  time  t0  after  the 
noise  process  has  started,  we  then  perform  a  one -parameter  LSQF  on  the  data  over  T  to  generate  our  fit 
Xpoiy.i(t)  =  a0.  One  can  immediately  see  that  something  is  amiss.  We  note  that  a ■ ,  is  significantly  less 

than  ow  j .  Thus,  a  white -noise-based  fit  precision  aW| ,  will  severely  underestimate  the  true  accuracy  ctw  j 


136 


41"  Annual  Precise  Time  and  Time  Interval  (PTT1)  Meeting 


when  N  »  1.  Furthermore,  one  can  easily  show  that  ,  — »  co  as  t0— >qo,  while  oy  will  remain  finite  [7- 

11].  This  is  a  physical  meaning  of  a  neg-p  accuracy  variance  infinity:  that  the  true  accuracy  of  a  fit  will 
become  severely  inaccurate  when  t0  is  large  (which  is  the  typical  physical  case).  One  can  obviously  see 
from  this  example  that  using  the  wrong  gW]  ,  and  cttM  to  represent  awl  in  this  case  will  just  mask  the 
problem,  not  fix  it. 

The  proper  response  here  would  be:  (a)  to  theoretically  analyze  pd  using  the  correct  noise  model  [7-11],  (b) 
to  identify  that  the  ctw  ,  infinity  will  occur  before  performing  the  experiment,  and  (c)  to  redesign  the 

system  (Hs(f))  and/or  reformulate  the  question  so  that  the  accuracy  infinity  will  not  occur  [7-11].  This  last 
step  often  involves  the  introduction  of  periodic  calibration  [11]. 

For  If  I  noise,  note  that  oy  ->®  as  tg— »co,  but  that  rrj2  will  remain  finite  [7,8].  Thus,  if  one  is 
interested  in  obtaining  a  finite  data  precision,  the  proper  response  to  a  ctj  M  infinity  is  to  change  the 

estimation  model  M-order,  not  to  arbitrarily  switch  to  an  Allan  or  Hadamard  variance  and  leave  the  model 
function  xa  M(t)  untouched. 

Finally,  we  note  that  M  oc  ln(fht0)  for  I  f  I  noise  and  Hs(f)  given  by  a  low-pass  cut-off  fh  [2]. 

Thus,  even  though  the  If  F1  contribution  to  M  is  strictly  infinite  as  t0— »go;  practically,  this  If  F1 
contribution  is  often  smaller  than  the  white  noise  contribution,  even  when  t0  is  the  age  of  the  universe  [17]. 


III.  MYTH  2:  ONE  CAN  OBTAIN  PROPER  ESTIMATION  RESULTS 
FROM  (NON-HP-FILTERED)  NEG-P  NOISE  CONTAINING  DATA 

In  this  section,  we  will  show  that  non-HP-filtered  neg-p  behaves  more  like  systematic  error  than 
conventional  noise.  Thus,  estimation  techniques  like  least  squares  and  Kalman  filters  can  generate 
severely  anomalous  results  when  such  noise  is  present,  since  estimation  techniques  have  difficulty 
separating  systematic  error  from  true  behavior  [5].  We  will  show  that  this  systematic-like  behavior  is  due 
to  the  non-ergodic  and  infinitely  correlated  behavior  of  non-HP-filtered  neg-p  noise.  In  the  next  sections, 
will  show  that  neg-p  noise  has  these  properties  using  the  non-stationary  (NS)  and  wide-sense  stationary 
(WSS)  pictures  of  a  random  process  [2,10,11,14,18]. 

A.  The  NS  and  WSS  Pictures  of  a  Random  Process 

Two  covariant  representations  or  pictures  are  generally  used  when  discussing  a  random  process  xr(t).  The 
more  inclusive,  but  less  familial-,  one  is  the  non-stationary  (NS)  picture  summarized  in  Table  II  [2,18].  In 
the  NS  picture,  xr(t)  is  zero  for  t<0  or  some  other  finite  value.  Because  of  this  finite  start  time,  the  noise 
covariance  or  autocorrelation  function  of  the  process  Rr(tg,x)  [14,18]  (the  same  here  because  we  are 
assuming  £xr(t)=0)  has  two  time  arguments:  tg  the  global  or  average  time  from  the  start  of  the  noise 
process  (t=0),  and  i  the  difference  or  local  time  between  the  covariant  xr  arguments  [18].  The  other  less 
inclusive  picture  is  the  more  familial-  wide-sense  stationary  (WSS)  one  [2,12,14,19].  This  picture  is  also 
summarized  in  Table  II.  Here,  xr(t)  is  non-zero  for  all,  t  and  xr(t)  is  assumed  to  be  statistically  time 
invariant  so  that  the  autocorrelation  function  is  now  given  by  Rr(x).  We  note  that  xr(t)  must  also  be 
statistically  bounded  for  the  process  to  be  WSS,  because  many  WSS  theorems  require  such  bounded 
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behavior  for  their  proof  [19].  This  bounded  behavior  is  often  ignored  in  descriptions  of  neg-p  noise,  which 
can  lead  to  erroneous  conclusions.  Even  when  the  underlying  noise  process  is  inherently  WSS,  note  that 
one  must  use  the  NS  Rr(tg,x)  for  small  tg,  because  of  initial  start-up  transients  [2]. 

As  shown  in  Table  II,  an  NS  process  has  three  different  covariant  spectral  functions  that  are  formed  by  the 
Fourier  Transforms  (FTs)  of  Rr(tg,x)  with  respect  to  various  combinations  of  tg  and  x  [18].  The  Wigner- 
Ville  function  Wr(tg,f)  in  the  table  can  be  interpreted  as  a  tg-dependent  PSD  and  is  the  most  physically 
intuitive  for  neg-p  noise  analysis.  The  Loeve  Spectrum  Fr(fg,f)  in  the  table,  on  the  other  hand,  is  useful  for 
simplifying  analytical  expressions  [2,9].  The  Ambiguity  Function  Ar(fg,x)  in  the  table  is  included  here  for 
completeness  and  is  used  in  signal  processing  [18].  In  the  WSS  picture,  the  SSB  PSD  Fr(f)  defined  in 
Table  II  is  the  well-known  spectral  function  formed  by  taking  the  complex  Fourier  transform  of  Rr(x)  with 
respect  to  x  [12,14,19].  Note  that  one  can  also  use  the  double-sideband  PSD  Sr(f),  as  is  common  practice 
in  time  and  frequency  papers  [1]. 

An  important  measure  of  the  behavior  of  a  random  process  is  its  correlation  time  xc,  which  is  defined  in 
Table  II.  Note  that  this  definition  is  an  extension  of  a  WSS  one  [20]  to  include  the  NS  picture  in  the  limit 
of  tg— >oo.  xc  is  an  important  parameter  in  statistical  estimation,  because  N,=T/xc  represents  the  number  of 
statistically  independent  samples  over  T  [20].  Thus,  averaging  over  N  samples  reduces  errors  by  some 
power  of  Ni  (not  N)  when  the  relevant  noise  process  is  correlated,  and  only  when  T  »  xc  [10,11].  This 
will  be  very  important  in  later  discussions. 

Finally,  note  from  Table  II  that  one  can  relate  the  NS  picture  to  the  WSS  picture  by  letting  tg— »oo  (tg— »oo  is 
equivalent  to  letting  the  xr(t)  start  time  go  to  -oo)  [2,21].  Two  important  NS  to  WSS  theorems  based  on  this 
are  also  given  in  Table  II  [2,11].  These  theorems  will  play  a  prominent  role  in  understanding  the  true 
statistical  properties  of  neg-p  noise,  as  we  will  discuss  in  the  next  section. 


B.  The  Statistical  Properties  of  Neg-p  Noise 

One  can  show  that  non-HP-filtered  neg-p  noise  has  the  basic  properties  listed  in  Table  III  [2].  Note  that 
the  WSS  Rp(x)  is  infinite  for  all  x,  because  Rp(x)  =  T  im  R  (t„,x)  ,  as  given  in  Table  II.  Thus,  the  WSS 

tg^C° 

Rp(x)  is  strictly  indefinable.  However,  because  L  (f )  =  Tim  W  (t„,f)  and  this  limit  is  well-behaved  for 

tg^«> 

fAO.  one  can  properly  define  the  WSS  Fp(f)  for  fAO  without  the  use  of  the  WSS  Rp(x).  Thus,  one  can 
interpret  equations  such  as  (2)  as  the  tg— »oo  limit  of  the  NS  picture  and  properly  apply  them  to  neg-p 
problems. 

Another  important  property  of  non-HP-filtered  neg-p  noise  listed  in  Table  III  is  that  its  xc  is  infinite.  This 
means  that  N,  is  effectively  zero  for  all  T,  which  is  one  of  the  factors  that  leads  to  the  anomalous  neg-p 
fitting  behavior  that  we  will  discuss  later. 

A  very  important  (and  not  well-known  property)  of  non-HP-filtered  neg-p  noise  is  that  it  is  intrinsically 
non-ergodic,  that  is  the  infinite  time  average  <  xp(t)>T->oo  is  not  equal  to  the  ensemble  average  1£xp(t)  for 
such  noise  [10,11,14,22].  This  is  a  consequence  of  a  theorem  stating  that  an  NS  random  process  xp(t)  is 
ergodic  if  and  only  if  Rp(tg,x)  is  bounded  for  all  tg  (including  oo)  and  the  last  xp(t)  point  in  <xp(t)>T  becomes 
decorrelated  with  <xp(t)>T  as  T— »oo  [22].  This  decorrelation  property  can  be  shown  to  imply  that  xc  must 
be  finite.  This  is  another  factor  that  leads  to  the  anomalous  fitting  behavior  of  neg-p  noise,  which  we  will 
now  discuss. 
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Table  II.  Representations  of  a  random  process. 


Non-Stationary  (NS)  Picture 
xr(t)  =  0  for  t<0 

tg  =  Global  or  average  time  from  start  of  noise  process 
x  =  Local  or  difference  time  E  =  Ensemble  average 


Covariance  or  Correlation 

Fn  (xr(t)  real,  Exr(t)  =  0) 

Rr(tg,x)  =  Exr(tg  +x/2)xr(tg  -x/2) 

Rr(tg,x)  =  0  for  tg  <  0  or  Ixl  >  2  tg 

Wigner-Ville  Function 

Wr(tg,f)  =  Ef  TRr(tg,x) 

Loeve  Spectrum 

Lr(fg,f)  =  Efg;tgWr(tg,f) 

Ambiguity  Function 

Ap(fg»x)  =  ^fg,tg^r(lg’T) 

Correlation  time 

i  r +co 

xc  =  Lim  0.5Rr(tg,0)  1  dxRr(tg,x) 

tg-HO  B  J  -CO  ' 

Number  of  statistically  independent  samples  over  T  N;=T/xc 

(Complex)  Fourier 
Transform 

/•  +00 

V(f)  =  #£tv(t)  =  1  dtexp(-jcot)v(t)  [co  =  27if] 

J  —00 

Wide-Sense  Stationary  (WSS)  Picture  x"  * 

xr(t)  *  0  tor  all  t  "*i  ^  h-  *■ 

«-t=0 

Covariance  or  Correlation 

Fn  (xr(t)  real,  Exr(t)  =  0) 

Rr(x)  =  Exr(tg  +x/2)xr(tg  -x/2) 

(SSB)  Power  Spectral 

Density 

Lr(f)  =  Ef  TRr(x) 

NS  +  WSS  Theorems  [4,21] 

Rr(x)  =  Lim  Rr 

^->0° 

(tg,x)  L  (f)  =  lim  W  (t„,f)=  Lim  jco  L  (f  f) 

Lp(fg,f)  form  of  Lr(f)  derived  from  the  Laplace  Final  Value  Theorem. 

Table  III.  Properties  of  non-highpass-filtered  neg-p  noise. 


NS  Picture: 

RD(tB,x)  <  oo  for  te  <  oo 

R„(ta,x)  =  oo  for  tB  — >  oo 

Wp(t„,f)  <  oo  for  all  f,  t„  <  oo 

Note  bandlimiting  needed  to  make  Wp(tg,x)  finite  for  p>-l 

WSS  Picture: 

Rp(x)  =  Lim  Rp(tg,x)  =  oo 

Ig^CO 

WSS  Rp(x)  is  undefinable  for  all  x. 

Lp(f)  =  Lim  W p(tg,f)oc|f  lp 

lg^-00 

Rp(x)  not  needed  to  define  Lp(f) 

Neg-p  noise  has  an  infinite  correlation  time:  xc  =  oo 

Neg-p  noise  is  inherently  non-ergodic:  Exp(t)  *  <xp(t)>T 
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C.  Anomalous  Fitting  Behavior  in  Finite  Data  Sets 

Now  let  us  investigate  how  the  non-ergodicity  and  infinite  tc  of  neg-p  noise  effects  practical 
implementations  of  fitting  techniques,  such  as  least  squares  and  Kalman  filters.  Fig.  5  shows  a  simulated 
unweighted  LSQF  for  both  p  =  0  and  p  =  -3  noise  when:  (a)  both  the  true  behavior  xc(t)  and  the  model 
function  xaM(t)  are  2nd-order  polynomials  (M=3),  (b)  Hs(f)  is  an  ideal  Nyquist  LP  filter,  and  (c)  the 
uncorrelated  pdo  is  used  to  predict  the  error  bars  xa  M(t)±awj,M  (which  arc  so  small  in  the  figures  that  they 
appeal-  coalesced  with  xaM(t)).  In  the  p  =  0  case  shown  on  the  left  of  Fig.  5,  note  that  the  fit  behaves  as 
ensemble -based  white-noise  fitting  theory  predicts;  that  is,  xc(t)  and  xaM(t)  fall  on  top  of  each  other  for  the 
large  N  used,  and  aW|,M  properly  predicts  ctw,m- 

For  the  p  =  -3  case  on  the  right  of  the  figure,  however,  note  that  xaM(t)  significantly  deviates  from  xc(t), 
while  the  white-noise -based  error  bars  do  not  properly  predict  this  deviation. 


0# 

x — 

,/  f 3  Noise 

^  *c  ""  ^a,M  —  ^wj,M 

Fig.  5.  Unweighted  LSQF  with  p  =  0  and  p  =  -3  (N=2048). 


What  is  happening  here  is  that  the  particular  xr(tn)  ensemble  member  in  this  example  has  behavior  that  is 
2nd-order  polynomial-like  and,  thus,  substantially  correlated  with  xc(t),  so  that  LSQF  cannot  separate  this 
correlated  noise  component  from  the  true  behavior  of  xc(t)  [5].  This  is  what  generates  the  anomalous 
fitting  results;  the  fit  interprets  the  correlated  noise  as  being  part  of  xc(t).  Such  anomalous  behavior  is  well 
known  as  the  result  of  correlated  systematic  error  [5]  and  is  a  specific  example  of  a  more  general 
principle — that  linearly  dependent  variables  cannot  be  separated  by  any  solution  technique,  because  the 
determinant  of  the  solution  matrix  goes  to  zero  [24].  This  systematic-like  behavior  in  the  neg-p  noise  case 
is  a  direct  result  of  its  infinitely  correlated  and  non-ergodic  nature.  Thus,  E-averaged  theory  predictions  do 
not  represent  the  behavior  of  individual  ensemble  members  over  the  data  collection  interval. 

An  important  consequence  of  the  above  is  that  noise  whitening,  a  procedure  meant  to  determine  the  true 
structure  of  xc(t)  by  increasing  M  in  the  model  function  xa  M(t)  until  the  residuals  xj  M(tn)  are  uncorrelated 
[5],  will  not  properly  identify  xc(t)  when  non-HP-filtered  neg-p  noise  is  present.  That  is,  the  truth  model 
for  such  noise  whitening  is  uncorrelated  noise  plus  true  behavior,  and  neg-p  noise  is  highly  uncorrelated. 
Note  that  p  =  - 1  noise  can  be  a  marginal  case  when  white  noise  is  also  present,  because  of  the  slow  growth 
of  this  noise  as  tg  becomes  large  [17]. 

Fig.  6  shows  that  Kalman  filters  also  exhibit  such  anomalous  neg-p  behavior  when  neg-p  noise  is  present. 
Here,  simulation  results  are  shown  for  both  p  =  0  and  p  =  -2  noise  (measurement  noise  not  process  noise 
[6,12])  when  xc(t)  and  xaM(t)  are  again  both  quadratic  polynomials.  In  the  middle  graph,  where  p  =  -2  and 
an  uncorrelated  noise  model  is  used  [4],  one  again  sees  the  characteristic  veering  off  of  xa  M(t)  from  xc(t) 
and  the  gross  underestimation  of  the  true  errors  by  the  error  bars.  What  is  even  more  interesting  is  the  right 
graph.  Here  again,  p  =  -2,  but  the  Kalman  filter  is  augmented  with  a  random  walk  measurement  noise 
model,  which  is  supposed  to  correct  for  such  anomalous  behavior  [6].  One  can  see  that  xa  M(t)  is  closer  to 
xc(t)  and  the  error  bars  do  a  better  job  of  estimating  the  true  error.  However,  there  still  are  substantial 
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deviations  of  xa  M(t)  from  xc(t)  and  the  error  bars  still  underestimate  the  true  error.  The  culprit  here  is  the 
non-ergodic-like  behavior  of  the  neg-p  noise  and  the  mimicking  of  the  true  behavior  by  the  neg-p  noise.  It 
is  expected  that  p  =  -3  noise  would  exhibit  even  more  significant  anomalous  behavior  with  an  augmented 
Kalman  filter,  because  p  =  -3  noise  looks  like  a  slowly  changing  random  drift  [2].  However,  the  author  has 
not  demonstrated  this  yet. 


f  2  Noise 


White 
Noise  Model 


f  2  Noise 


Correlated 
Noise  Model 


"xc  ::::  Xa,M  ±  Awj> 


M 


Fig.  6.  Anomalous  neg-p  behavior  in  a  Kalman  filter. 


D.  Ergodicity,  tc,  and  Proper  Fitting  Behavior 

Fig.  7  illustrates  that  neg-p-like  anomalous  fitting  behavior  also  occurs  in  ergodic  WSS  but  correlated 
processes  when  one  does  not  have  T/xc  »  1.  Shown  here  is  a  (non-augmented)  Kalman  simulation  using 
stationary  Gauss-Markov  noise  (single  pole  lowpass-filtered  white  noise  [12,14]).  This  noise  is  both 
ergodic  and  WSS,  but  has  a  tc  related  to  the  reciprocal  of  the  lowpass  knee  frequency  of  the  noise  filter. 
One  can  observe  from  the  figure  that  the  Kalman  results  behave  as  theoretically  expected  for  T/xc  »  1 ,  but 
become  more  and  more  anomalous  as  T/xc  approaches  1 .  This  occurs  because  the  single  noise  ensemble 
members  averaged  over  T  here  do  not  behave  like  their  ensemble -averaged  counterparts  when  T 
approaches  xc.  That  is,  the  noise  is  not  ergodic-like  over  T  (<£...  ^  <...>T)  when  we  don’t  have  T/xc  »1 
[11].  One  can  see  here  that  strict  ergodicity,  T...-  <. . ,>T^X  [14,22]  does  not  guarantee  such  ergodic-like 
behavior  over  any  T.  Another  way  to  view  this  is  that  there  are  not  enough  statistically  independent 
samples  N,  =  T/xs  when  we  don’t  have  T/xc »  1  for  the  fit  to  be  statistically  meaningful.  We  note  that 
works  on  ensemble-based  fitting  theory  [5,6,12]  often  implicitly  assume  <E...=  <...>T  as  N— >oo  for  any  T, 
but  we  have  just  shown  this  is  not  the  case  for  correlated  noise  processes.  This  assumption  for  T— >0  is 
called  local  ergodicity  [25].  To  coin  a  phrase,  a  noise  process  with  a  substantial  xc  is  intermediate  ergodic, 
that  is,  one  must  have  T/xc  »  1  for  the  process  to  be  ergodic-like. 


Fig.  7.  Kalman  filter  simulations  for  a  correlated  Gauss-Markov  process:  (a)  T/xc=2000, 
(b)T/xc=200,  (c)  T/xc=20,  (d)  T/xe=2. 


141 


41"  Annual  Precise  Time  and  Time  Interval  (PTT1)  Meeting 


Finally,  what  is  obvious  from  this  above  discussion  is  that  non-highpass-filtered  neg-p  noise  will  generate 
such  anomalous  polynomial  fitting  behavior  for  any  T,  because  xc  =  go.  Again,  If  I-1  noise  can  often  be  a 
marginal  exception,  because  t0  is  large,  but  not  infinite,  and  white  noise  effects  can  dominate. 


IV.  CONCLUSIONS 

In  this  paper,  we  have  shown  that  one  cannot  simply  swap  variances  to  “fix”  a  neg-p  divergence  problem 
without  further  action.  We  have  shown  that  each  type  of  variance  is  a  statistical  answer  to  a  different  error 
question  and  such  arbitrary  swapping  merely  masks  the  true  problem  that  caused  the  divergence.  We  have 
also  shown  that  such  neg-p  variance  divergences  are  true  indicators  of  estimation  problems  that  must  be 
physically  addressed,  not  ignored.  Furthermore,  we  have  shown  that  non-highpass-filtered  neg-p  noise  acts 
like  systematic  error  and  generates  anomalous  behavior  in  statistical  estimation  techniques  like  least 
squares  and  Kalman  filters  when  the  estimation  functions  consist  of  polynomials.  It  has  also  been  shown 
that  this  systematic  behavior  is  due  to  the  non-ergodic  and  infinitely  correlated  nature  of  such  neg-p  noise. 
As  a  final  note,  the  surest  way  to  reduce  the  anomalous  effects  of  neg-p  noise  is  develop  frequency 
standards  with  lower  neg-p  noise.  This  is  good  news  for  frequency  standards  developers. 
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